Search CORE

30 research outputs found

Set-oriented data mining in relational databases

Author: Houtsma Maurice
Swami Arun
Publication venue: North Holland
Publication date: 01/01/1995
Field of study

Data mining is an important real-life application for businesses. It is critical to find efficient ways of mining large data sets. In order to benefit from the experience with relational databases, a set-oriented approach to mining data is needed. In such an approach, the data mining operations are expressed in terms of relational or set-oriented operations. Query optimization technology can then be used for efficient processing.\ud \ud In this paper, we describe set-oriented algorithms for mining association rules. Such algorithms imply performing multiple joins and thus may appear to be inherently less efficient than special-purpose algorithms. We develop new algorithms that can be expressed as SQL queries, and discuss optimization of these algorithms. After analytical evaluation, an algorithm named SETM emerges as the algorithm of choice. Algorithm SETM uses only simple database primitives, viz., sorting and merge-scan join. Algorithm SETM is simple, fast, and stable over the range of parameter values. It is easily parallelized and we suggest several additional optimizations. The set-oriented nature of Algorithm SETM makes it possible to develop extensions easily and its performance makes it feasible to build interactive data mining tools for large databases

CiteSeerX

Crossref

University of Twente Research Information

Algebraic optimization of recursive queries

Author: Apers Peter M.G.
Houtsma M.A.W.
Houtsma Maurice A.W.
Publication venue: North Holland
Publication date: 01/01/1992
Field of study

Over the past few years, much attention has been paid to deductive databases. They offer a logic-based interface, and allow formulation of complex recursive queries. However, they do not offer appropriate update facilities, and do not support existing applications. To overcome these problems an SQL-like interface is required besides a logic-based interface.\ud \ud In the PRISMA project we have developed a tightly-coupled distributed database, on a multiprocessor machine, with two user interfaces: SQL and PRISMAlog. Query optimization is localized in one component: the relational query optimizer. Therefore, we have defined an eXtended Relational Algebra that allows recursive query formulation and can also be used for expressing executable schedules, and we have developed algebraic optimization strategies for recursive queries. In this paper we describe an optimization strategy that rewrites regular (in the context of formal grammars) mutually recursive queries into standard Relational Algebra and transitive closure operations. We also describe how to push selections into the resulting transitive closure operations.\ud \ud The reason we focus on algebraic optimization is that, in our opinion, the new generation of advanced database systems will be built starting from existing state-of-the-art relational technology, instead of building a completely new class of systems

CiteSeerX

University of Twente Research Information

Data fragmentation for parallel transitive closure strategies

Author: Apers Peter M.G.
Houtsma M.A.W.
Houtsma Maurice A.W.
Schipper Gideon L.V.
Publication venue: IEEE
Publication date: 01/01/1993
Field of study

Addresses the problem of fragmenting a relation to make the parallel computation of the transitive closure efficient, based on the disconnection set approach. To better understand this design problem, the authors focus on transportation networks. These are characterized by loosely interconnected clusters of nodes with a high internal connectivity rate. Three requirements that have to be fulfilled by a fragmentation are formulated, and three different fragmentation strategies are presented, each emphasizing one of these requirements. Some test results are presented to show the performance of the various fragmentation strategie

University of Twente Research Information

Preface

Author: Apers Peter
Blanken Henk
Houtsma Maurice
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1997
Field of study

University of Twente Research Information

Complex transitive closure queries on a fragmented graph

Author: Apers Peter M.G.
Ceri Stefano
Houtsma Maurice A.W.
Publication venue: Springer
Publication date: 01/01/1990
Field of study

In this paper we study the reformulation of transitive closure queries on a fragmented graph. We split a query into several subqueries, each requiring only a fragment of the graph. We prove this reformulation to be correct for shortest path and bill of material queries. Here we describe the reformulation for an abstract graph, elsewhere we have described an actual implementation of our approach and some promising simulation results.\ud \ud We view the study of distributed computation of transitive closure queries as a result of the trend towards distributed computation. First selections were distributed to fragments of a relation, then fragmentation was used to compute joins in a distributed way, and now we are studying distributed computation of transitive closure queries. This should result in a deeper insight into the use and possible benefit of parallelism. Our work may be used in ordinary distributed databases as well as advanced multiprocessor database machines, such as PRISMA.\ud \ud Although this research was started to efficiently use distributed computation, it turns out to be beneficiary in a central environment as well. This is due to the introduction of extra selections, stemming from an appropriate fragmentation. This leads to extra focus on relevant data

University of Twente Research Information

Conceptual data models versus knowledge graphs

Author: Hoede Cornelis
Houtsma Maurice A.W.
Publication venue: 'University Library/University of Twente'
Publication date: 01/01/1991
Field of study

University of Twente Research Information

Data fragmentation for parallel transitive closure strategies

Author: Apers Peter M.G.
Houtsma M.A.W.
Houtsma Maurice A.W.
Schipper Gideon L.V.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/1993
Field of study

A topic that is currently inspiring a lot of research is parallel (distributed) computation of transitive closure queries. In [lo] the disconnection set approach has been introduced as an effective strategy for such a computation. It involves reformulating a transitive closure query on a relation into a number of transitive closure queries on smaller fragments; these queries can then execute independently on the fragments, without need for communication and without computing the same tuples at more than one processor. Now that effective strategies as just mentioned have been developed, the next problem is that of developing adequate data fragmentation strategies for these approaches. This is a dificult problem, but of paramount importance to the success of these approaches. We discuss the issues that influence data fragmentation. We present a number of algorithms, each focusing on one of the important issues. We discuss the pros and cons of the algorithms, and we give some results of applying the algorithms to different types of graphs. This last aspect shows to what respect the algorithms indeed conform to the goals we set out

Data and Knowledge model:A proposal

Author: Apers Peter M.G.
Houtsma Maurice A.W.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/09/1987
Field of study

University of Twente Research Information

A Survey of Parallel Execution Strategies for Transitive Closure and Logic Programs

Author: Filippo Cacace
Maurice Houtsma
Stefano Ceri
Publication venue
Publication date
Field of study

An important feature of database technology of the nineties is the use of parallelism for speeding up the execution of complex queries. This technology is being tested in several experimental database architectures and a few commercial systems for conventional select-project-join queries. In particular, hash-based fragmentation is used to distribute data to disks under the control of different processors in order to perform selections and joins in parallel. With the development of new query languages, and in particular with the definition of transitive closure queries and of more general logic programming queries, the new dimension of recursion has been added to query processing. Recursive queries are complex; at the same time, their regular structure is particularly suited for parallel execution, and parallelism may give a high efficiency gain. In this paper, we survey the approaches to parallel execution of recursive queries that have been presented in the recent literature. We obser..

CiteSeerX